Skin Cancer Classification Using CNN¶
👋 Introduction¶
Skin cancer is one of the most common cancers worldwide, and early detection is crucial for effective treatment. In this project, we develop a Convolutional Neural Network (CNN) to classify skin lesions using the HAM10000 dataset, which contains images of various skin conditions. By leveraging deep learning, our goal is to build a model that can assist in early diagnosis. This notebook covers the entire process, from data provisioning and model training to evaluation and interpretability.
Table of Contents¶
- 📖 Domain Understanding
- 📦 Data Provisioning
- 📋 Data Requirements
- 🗂️ Data Collection
- 📊 Data Understanding
- 🛠️ Data Preparation
- 🔮 Predictions
- 🧹 Preprocessing
- 🧬 Modelling
- Pre-Trained CNN - EfficientNetB0
- Custom CNN
- Pre-trained CNN - ResNet50
- Pre-trained CNN - ResNet50V2
- Hyperparameter Tuning
- 🧐 Evaluation
- Load Each Trained Model
- Evaluate Models on the Validation Set
- Generate Predictions for Confusion Matrix & Classification Report
- Classification Report
- Correct and Missclassified Images
- Grad-CAM
- Conclusion of Grad-CAM Analysis
- 🤝 Conclusion
- 📚 References
📖 Domain Understanding¶
Skin cancer is one of the most frequently diagnosed cancers worldwide, with millions of new cases reported each year. It occurs when abnormal skin cells grow uncontrollably due to genetic mutations, often triggered by excessive exposure to ultraviolet (UV) radiation. The most common types of skin cancer include basal cell carcinoma (BCC), squamous cell carcinoma (SCC), and melanoma. While BCC and SCC are more common and typically less aggressive, melanoma is highly dangerous due to its potential to spread to other parts of the body if not detected early. Early detection and treatment are critical in preventing severe complications and improving patient survival rates (The Skin Cancer Foundation, 2024).
Challenges in Skin Cancer Diagnosis
- Limited accessibility – In regions with few dermatologists, patients may face delays in screening and diagnosis, increasing the risk of disease progression.
- High cost and time consumption – Conducting biopsies for every suspicious lesion is expensive and time-consuming.
The Role of Deep Learning in Skin Cancer Detection Recent advancements in AI, particularly Convolutional Neural Networks (CNNs), have shown significant potential in automating skin cancer detection. CNNs are specialized deep learning models designed for image analysis, making them highly effective for medical imaging tasks such as detecting patterns in dermatoscopic images. Unlike traditional machine learning models, CNNs automatically extract and learn important features, such as texture, color, and lesion borders, without requiring manual intervention. Studies have demonstrated that CNN-based models can achieve dermatologist-level accuracy in classifying skin lesions, sometimes even outperforming human experts in distinguishing between benign and malignant cases (Goyal et al., 2019).
However, challenges remain, particularly in ensuring fair performance across diverse populations and improving model interpretability to gain trust in clinical practice.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os
import cv2
import random
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import tensorflow as tf
from tensorflow.keras.applications import EfficientNetB0, ResNet50, ResNet50V2
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Dense, Dropout, GlobalAveragePooling2D, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.models import load_model
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from tensorflow.keras.preprocessing.image import load_img, img_to_array
📦 Data Provisioning¶
This section covers the fundamental aspects of data provisioning. We are going to follow the AI Project Methodology to maintain a structured approach. The key steps in this section are: Data Requirements, Data Collection, Data Understanding, and Data Preparation.
📋 Data Requirements¶
Data Elements¶
- Image Data
- Data Type: Image
- Derivation: Composed of pixel intensity values (RGB images)
- Size: Variable, commonly resized to 128×128 or 224×224 pixels for CNN input
- Content: Dermatoscopic images of skin lesions
- Color Format: RGB (3 channels)
- Categorical Data
- Data Type: Categorical
- Example: Lesion diagnosis labels
- Categories:
- Actinic keratoses (AKIEC)
- Basal cell carcinoma (BCC)
- Benign keratosis-like lesions (BKL)
- Dermatofibroma (DF)
- Melanoma (MEL)
- Melanocytic nevi (NV)
- Vascular lesions (VASC)
- Numerical Data
- Data Type: Numerical
- Example: Patient age
- Units: Years
- Range: 0 to 100+
- Metadata
- Patient ID: Unique identifier for anonymized patient records
- Sex: Male/Female
- Anatomical Site: Location of lesion (e.g., Face, Back, Arm)
- Dataset Source: HAM10000 (ISIC Archive)
Data Volume¶
A minimum dataset of 10,000+ images is considered sufficient for training deep learning models, ensuring that the CNN generalizes well to unseen cases.
Data Quality Standards¶
To maintain the integrity and effectiveness of the dataset, the following quality standards are required:
- Accuracy: Labels should correctly represent the lesion type, validated by dermatologists.
- Completeness: No missing essential metadata (e.g., diagnosis, patient age, image file).
- Consistency: Image quality should remain uniform in terms of resolution and color distribution.
- Anonymization: Patient identities must be protected following GDPR guidelines.
- Balance: Mitigate class imbalance using techniques like data augmentation or weighted loss functions.
Data Dictionary¶
- Image
- Data Element Name: Image
- Data Type: Image (RGB)
- Units: N/A
- Size: Variable (resized to 128×128 or 224×224 pixels for CNN input)
- Description: Dermatoscopic image of a skin lesion
- Source: HAM10000 dataset
- Quality Standards: High-resolution, clear images preferred, with proper lesion visibility
- Notes: Images should not contain patient-identifiable information
- Lesion Type
- Data Element Name: Lesion Type
- Data Type: Categorical
- Units: N/A
- Range: One of the seven lesion categories
- Description: Medical diagnosis of the skin lesion
- Source: HAM10000 dataset (ISIC Archive)
- Quality Standards: Correct and validated labeling
- Notes: Used as the target variable for classification
- Patient Age
- Data Element Name: Patient Age
- Data Type: Numerical
- Units: Years
- Range: 0–100+
- Description: Age of the patient at the time of diagnosis
- Source: HAM10000 dataset
- Quality Standards: No missing or inconsistent values
- Notes: May require normalization for deep learning models
- Anatomical Site
- Data Element Name: Anatomical Site
- Data Type: Categorical
- Units: N/A
- Range: Common body locations (e.g., Face, Back, Arm, Leg)
- Description: The body location where the lesion is present
- Source: HAM10000 dataset
- Quality Standards: Standardized categories
- Notes: Useful for improving classification accuracy
Ethical Considerations¶
- Privacy & Anonymization: Since the dataset originates from real patient cases, all personal identifiers are removed to comply with GDPR and HIPAA regulations.
- Clinical Validation: The AI model trained on HAM10000 must be validated against real-world clinical diagnoses before being considered for medical applications.
- Usage Restrictions: The dataset should be used strictly for research and educational purposes, ensuring compliance with ethical AI development in healthcare.
🗂️ Data Collection¶
Data Collection is a foundational step in AI. The aim is to acquire a representative and informative dataset that enables the CNN to understand patterns and make accurate predictions.
Data Source¶
The dataset used in this project is HAM10000 (Human Against Machine with 10,000 Training Images), a publicly available dataset specifically created for skin cancer classification. The source for the data is Kaggle.
Data Storage¶
Proper data storage is crucial for efficient model training and accessibility.
- Current Storage Approach: The dataset is stored locally on disk as image files (.jpg) and metadata in a CSV format (HAM10000_metadata.csv). Images are organized in folders based on lesion type for easy retrieval. TensorFlow’s ImageDataGenerator will be used to load images dynamically during training.
- Future Considerations: If the dataset size increases, cloud storage solutions such as Google Drive, AWS S3, or Azure Blob Storage could be used.
HAM10000 Dataset¶
The HAM10000 dataset (Human Against Machine with 10,000 training images) is a widely used benchmark dataset for skin cancer classification using deep learning. It was created by Philipp Tschandl, Cliff Rosendahl, and Harald Kittler, researchers in dermatology and machine learning, to support the development of automated diagnostic systems for skin cancer. The dataset was published as part of the ISIC (International Skin Imaging Collaboration) initiative and is openly available for research and educational purposes.
The dataset consists of 10,015 dermatoscopic images, collected from multiple sources, including academic clinics in Austria and Australia. The images were acquired through different modalities, ensuring diversity in lighting conditions, resolution, and skin types.
The dataset was annotated and validated by expert dermatologists using histopathology, dermatoscopic follow-up, and expert consensus. The diagnostic labels are highly reliable, making HAM10000 a gold standard dataset for AI research in dermatology.
Sample the Data¶
Let's take a quick look at the data. We have a metadata file that contains important information about each image including: image_id, dx(diagnosis, aka lesion type), dx_type (how the diagnosis was confirmed), age (of the patient), sex (of the patient), localization (where the lesion is located on the body).
metadata_path = "archive/HAM10000_metadata.csv"
df = pd.read_csv(metadata_path)
print("Metadata Sample:")
print(df.head())
Metadata Sample:
lesion_id image_id dx dx_type age sex localization
0 HAM_0000118 ISIC_0027419 bkl histo 80.0 male scalp
1 HAM_0000118 ISIC_0025030 bkl histo 80.0 male scalp
2 HAM_0002730 ISIC_0026769 bkl histo 80.0 male scalp
3 HAM_0002730 ISIC_0025661 bkl histo 80.0 male scalp
4 HAM_0001466 ISIC_0031633 bkl histo 75.0 male ear
We also have 2 folders with the images.
images_part1_path = "archive/HAM10000_images_part_1/"
images_part2_path = "archive/HAM10000_images_part_2/"
image_paths = {img: os.path.join(images_part1_path, img) for img in os.listdir(images_part1_path)}
image_paths.update({img: os.path.join(images_part2_path, img) for img in os.listdir(images_part2_path)})
# sampling 5 random images from both folders linking them with the correct label
def display_sample_images(df, image_paths, num_samples=5):
sample = df.sample(num_samples)
fig, axes = plt.subplots(1, num_samples, figsize=(15, 5))
for i, (_, row) in enumerate(sample.iterrows()):
img_id = row['image_id'] + ".jpg"
img_path = image_paths.get(img_id, None)
if img_path and os.path.exists(img_path):
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
axes[i].imshow(img)
axes[i].set_title(f"Label: {row['dx']}")
axes[i].axis("off")
plt.show()
display_sample_images(df, image_paths)
📊 Data Understanding¶
In this step, we are going to explore the metadata dataset and the images. We are going to see how the distribution is between the different lesion types and determine what steps are needed to prepare the data for the CNNs.
print("\nDataset Info:")
print(df.info())
Dataset Info: <class 'pandas.core.frame.DataFrame'> RangeIndex: 10015 entries, 0 to 10014 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 lesion_id 10015 non-null object 1 image_id 10015 non-null object 2 dx 10015 non-null object 3 dx_type 10015 non-null object 4 age 9958 non-null float64 5 sex 10015 non-null object 6 localization 10015 non-null object dtypes: float64(1), object(6) memory usage: 547.8+ KB None
The HAM10000 dataset consists of 10,015 entries (rows), with each row representing a dermatoscopic image of a skin lesion. The dataset contains 7 columns, which provide detailed information about each image, including its diagnosis, diagnostic method, patient demographics, and body location.
The age column has 57 missing values (9958 non-null out of 10,015). This might require handling (to be decided whether filling missing values with the median or removing them is better).
The dataset contains mostly categorical variables (e.g., dx, dx_type, sex, localization). Only one numerical column (age) exists, which is of type float64.
print("\nSummary Statistics:")
print(df.describe())
Summary Statistics:
age
count 9958.000000
mean 51.863828
std 16.968614
min 0.000000
25% 40.000000
50% 50.000000
75% 65.000000
max 85.000000
The summary statistics of the age column provide insights into the age distribution of patients in the HAM10000 dataset. There are 9,958 non-null values, meaning 57 missing values need to be handled. The average age of patients is approximately 51.86 years, suggesting that most individuals in the dataset are middle-aged or older. The standard deviation of 16.97 indicates that patient ages vary significantly across the dataset. The youngest patient is 0 years old (missing data), while the oldest is 85 years old. 25% of patients are 40 years old or younger. 50% of patients (median age) are 50 years old. 75% of patients are 65 years old or younger.
print("\nMissing Values:")
print(df.isnull().sum())
Missing Values: lesion_id 0 image_id 0 dx 0 dx_type 0 age 57 sex 0 localization 0 dtype: int64
As we already saw, there are 57 missing values in the Age column that will be further addressed in the Data Preparation step.
plt.figure(figsize=(10,5))
sns.countplot(y=df['dx'], order=df['dx'].value_counts().index, palette="coolwarm")
plt.title("Distribution of Skin Lesion Types")
plt.xlabel("Count")
plt.ylabel("Lesion Type")
plt.show()
C:\Users\anika\AppData\Local\Temp\ipykernel_14992\15016798.py:2: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect. sns.countplot(y=df['dx'], order=df['dx'].value_counts().index, palette="coolwarm")
As we can see we have a big class imbalance. The Melanocytic Nevi (nv) is the most common lesion type with significantly more samples than any other class. Dermatofibroma (df) and Vascular Lesions (vasc) are the least represented classes.
Why this needs to be further addressed in the Data Preparation?
If a model is trained on this data it may become biased towards detecting "nv" lesions more accurately, while struggling with the other classes.
plt.figure(figsize=(10,5))
sns.histplot(df['age'].dropna(), bins=30, kde=True, color="purple")
plt.title("Age Distribution of Patients")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()
The histogram above shows the age distribution of patients in the dataset. The most common age group is between 40 and 60 years old, with a peak around 50 years. The Kernel Density Estimation (KDE) curve highlights the overall trend, showing a gradual increase in cases from childhood to middle age, followed by a decline in older age groups.
plt.figure(figsize=(6,4))
sns.countplot(x=df['sex'], palette="Set2")
plt.title("Gender Distribution of Patients")
plt.xlabel("Gender")
plt.ylabel("Count")
plt.show()
C:\Users\anika\AppData\Local\Temp\ipykernel_14992\4104451605.py:2: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.countplot(x=df['sex'], palette="Set2")
The bar chart above represents the gender distribution of patients in the dataset. There are more male patients than female patients, but the difference is not extreme. A small number of cases have "unknown" gender, which might indicate missing or incorrect data entries. the dataset is relatively balanced in terms of gender.
plt.figure(figsize=(10,5))
sns.countplot(y=df['localization'], order=df['localization'].value_counts().index, palette="Blues_r")
plt.title("Body Location of Skin Lesions")
plt.xlabel("Count")
plt.ylabel("Body Location")
plt.show()
C:\Users\anika\AppData\Local\Temp\ipykernel_14992\4152752567.py:2: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect. sns.countplot(y=df['localization'], order=df['localization'].value_counts().index, palette="Blues_r")
As we can see the locations of the lesions vary, but most of them appear on the back, lower extremeties, and trunk, which are common areas for skin abnormalities. A small number of cases are labeled as "unknown", indicating missing or unclear location data.
plt.figure(figsize=(10,6))
sns.countplot(data=df, x='sex', hue='dx', palette='muted')
plt.title("Distribution of Lesion Types by Gender")
plt.xlabel("Gender")
plt.ylabel("Count")
plt.legend(title="Lesion Type", loc="upper right")
plt.show()
The stacked bar chart shows the distribution of different skin lesion types across male, female, and unknown gender categories. Melanocytic nevi (nv) is the most common lesion type in both males and females. Benign keratosis-like lesions (bkl) and basal cell carcinoma (bcc) are also frequent, though slightly more common in males. The distribution is relatively balanced across genders, with no strong bias in lesion types. A small number of cases have unknown gender, but the impact is minimal.
bins = [0, 20, 40, 60, 80, 100]
labels = ["0-20", "21-40", "41-60", "61-80", "81-100"]
df["age_group"] = pd.cut(df["age"], bins=bins, labels=labels, include_lowest=True)
plt.figure(figsize=(12,6))
sns.countplot(data=df, x='age_group', hue='dx', palette='coolwarm')
plt.title("Distribution of Lesion Types by Age Group")
plt.xlabel("Age Group")
plt.ylabel("Count")
plt.legend(title="Lesion Type", loc="upper right")
plt.show()
The bar chart illustrates how different skin lesion types are distributed across age groups. Nv again is the most common, specifically in the 21-60 age range, with a peak in 41-60 years. Benign keratosis-like lesions (bkl) and melanoma (mel) increase in older age groups, particularly in 61-80 years. The 0-20 age group has the fewest cases, as skin cancer is rare in children.
plt.figure(figsize=(12,6))
sns.heatmap(pd.crosstab(df['localization'], df['dx']), annot=True, cmap="coolwarm", fmt='d')
plt.title("Lesion Type vs. Body Localization")
plt.xlabel("Lesion Type")
plt.ylabel("Body Location")
plt.show()
The heatmap visualizes the distribution of lesions across different locations on the body. Nv is most common on the back, lower extremities and trunk. Rare lesion types appear in limited locations, making classification more challenging.
plt.figure(figsize=(12,6))
sns.boxplot(x="dx", y="age", data=df, palette="Set3")
plt.xticks(rotation=30)
plt.title("Age Distribution by Lesion Type")
plt.xlabel("Lesion Type")
plt.ylabel("Age")
plt.show()
C:\Users\anika\AppData\Local\Temp\ipykernel_14992\4278060063.py:2: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.boxplot(x="dx", y="age", data=df, palette="Set3")
The boxplot illustrates the age distribution for each skin lesion type. Akiec and Bcc occur mostly in older patients, with median ages around 65-70 years. Melanoma is more common in middle-aged individuals and is with wider age range. Some outliers are present as well.
display_sample_images(df, image_paths)
In the sample above, you can see again some of the images in our dataset and now we are going to analyze the aspect ratios. The histogram below shows a straight line near 0.8 meaning that most images have consistent aspect ratio of around 0.8 (height is 80% of the width). This suggests that the images are uniform in shape.
def get_aspect_ratios(image_paths, num_samples=100):
ratios = []
sample_images = random.sample(list(image_paths.values()), num_samples)
for img_path in sample_images:
if os.path.exists(img_path):
img = cv2.imread(img_path)
h, w = img.shape[:2]
ratios.append(h / w)
return np.array(ratios)
aspect_ratios = get_aspect_ratios(image_paths)
plt.figure(figsize=(8,5))
sns.histplot(aspect_ratios, bins=30, kde=True, color="green")
plt.title("Distribution of Image Aspect Ratios")
plt.xlabel("Aspect Ratio (Height/Width)")
plt.ylabel("Frequency")
plt.show()
To conclude the Data Understanding, the dataset provides a diverse collection of dermatoscopic images with seven skin lesion types, accompanied by metadata on age, gender, and body location. Our analysis revealed class imbalance, with melanocytic nevi (nv) being the most frequent lesion and some types underrepresented. Age distribution shows most patients are between 40-70 years old, with certain lesion types more common in older individuals. Lesions are most frequently found on the back, lower extremities, and trunk, aligning with sun exposure patterns. Gender distribution is fairly balanced, with no strong biases. Additionally, the images have a consistent aspect ratio (~0.8), reducing the risk of distortion during preprocessing.
🛠️ Data Preparation¶
Before training our model/s, we need to ensure our dataset is clean, consistent, and complete. This involves handling missing values and unknown entries in key columns while retaining as much data as possible for better model performance.
- Missing Age Values
The age column has 57 missing values out of 10,015 records. Instead of dropping these rows, we fill them with the median age (50 years) to maintain a strong dataset size.
df['age'].fillna(df['age'].median(), inplace=True)
C:\Users\anika\AppData\Local\Temp\ipykernel_14992\1654433547.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
df['age'].fillna(df['age'].median(), inplace=True)
- Unknown Values in Gender
A small portion of the records have "unknown" as their gender. Instead of removing them, we replace with the most frequent gender. This way we achieve minimal impact without data loss.
df['sex'].replace('unknown', df['sex'].mode()[0], inplace=True)
C:\Users\anika\AppData\Local\Temp\ipykernel_14992\1306173304.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
df['sex'].replace('unknown', df['sex'].mode()[0], inplace=True)
- Unknown Values in Localization
For the localization column, we have a bigger number of unknowns. to not introduce biases we will not use the most frequent location to fill them in. We will also not drop these entries as this will lead to data loss.
unknown_localization_count = df[df['localization'] == 'unknown'].shape[0]
unknown_localization_count
234
Since different lesion types appear in specific body locations, we can fill unknown values based on the most frequent location for each lesion type.
lesion_location_map = df.groupby('dx')['localization'].agg(lambda x: x.mode()[0]).to_dict()
df.loc[df['localization'] == 'unknown', 'localization'] = df['dx'].map(lesion_location_map)
null_values = df.isnull().sum()
unknown_values_localization = (df['localization'] == 'unknown').sum()
unknown_values_gender = (df['sex'] == 'unknown').sum()
print("Null Values in Each Column:\n", null_values)
print("\nNumber of 'unknown' values in localization:", unknown_values_localization)
print("Number of 'unknown' values in gender:", unknown_values_gender)
Null Values in Each Column: lesion_id 0 image_id 0 dx 0 dx_type 0 age 0 sex 0 localization 0 age_group 57 dtype: int64 Number of 'unknown' values in localization: 0 Number of 'unknown' values in gender: 0
if 'age_group' in df.columns:
df.drop(columns=['age_group'], inplace=True)
print("Remaining columns:", df.columns)
Remaining columns: Index(['lesion_id', 'image_id', 'dx', 'dx_type', 'age', 'sex', 'localization'], dtype='object')
The age_group column that we see was created during the Data Understanding and is not needed so we can proceed with dropping it. Besides that, we can see there no more null or unknown values in the dataset and we can continue with the Predictions chapter.
🔮 Predictions¶
🧹 Preprocessing¶
Before training, we need to prepare the data for the models. This section describes the steps taken to organize, clean, balance, and transform the data before feeding it into a deep learning model. The preprocessing steps ensure that:
- Image paths are mapped correctly to their corresponding labels.
- Metadata is cleaned and encoded for compatibility with machine learning models.
- The dataset is balanced using Synthetic Minority Over-sampling Technique (SMOTE) to prevent class imbalance issues.
- Data augmentation and normalization are applied to improve generalization.
- Labels are encoded into numerical format for proper classification.
- Mapping Image Paths
In this step, we map each image ID to its corresponding file path within the dataset folders. Since the images are stored across two directories, we iterate through them and create a dictionary that associates each image ID with its full file path. This ensures that we can easily retrieve images when loading them into the model. This is essential because raw image IDs alone are not useful for training.
IMG_SIZE = (224, 224)
image_paths = {}
for img_folder in ["archive/HAM10000_images_part_1", "archive/HAM10000_images_part_2"]:
for img_name in os.listdir(img_folder):
image_paths[img_name.split(".")[0]] = os.path.join(img_folder, img_name)
df['image_path'] = df['image_id'].map(image_paths)
print(df[['image_id', 'image_path']].head())
image_id image_path 0 ISIC_0027419 archive/HAM10000_images_part_1\ISIC_0027419.jpg 1 ISIC_0025030 archive/HAM10000_images_part_1\ISIC_0025030.jpg 2 ISIC_0026769 archive/HAM10000_images_part_1\ISIC_0026769.jpg 3 ISIC_0025661 archive/HAM10000_images_part_1\ISIC_0025661.jpg 4 ISIC_0031633 archive/HAM10000_images_part_2\ISIC_0031633.jpg
- Metadata Processing & Data Balancing Using SMOTE
The dataset is split into training (80%) and test (20%) sets using stratification to maintain the class distribution. SMOTE (Synthetic Minority Over-sampling Technique) is applied to the training data. This generates synthetic examples for underrepresented classes, helping the model learn equally from all categories and preventing bias. Image paths are reassigned to match the new oversampled dataset. Since SMOTE creates synthetic samples, we randomly duplicate real image paths to maintain correct image-label mapping.
df_metadata = df.drop(columns=['lesion_id', 'image_id'])
X = df_metadata.drop(columns=['dx'])
y = df_metadata['dx']
X = pd.get_dummies(X, drop_first=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
smote = SMOTE(random_state=42)
X_train_sm, y_train_sm = smote.fit_resample(X_train, y_train)
df_train_sm = pd.DataFrame(X_train_sm, columns=X.columns)
df_train_sm['dx'] = y_train_sm
num_oversampled = len(df_train_sm) - len(X_train)
image_paths_extended = np.concatenate([
df.loc[X_train.index, 'image_path'].values,
np.random.choice(df.loc[X_train.index, 'image_path'].values, num_oversampled, replace=True)
])
df_train_sm['image_path'] = image_paths_extended
df_test = df.loc[X_test.index, ['dx', 'image_path']]
print("Original Training Samples:", len(X_train))
print("After SMOTE:", len(df_train_sm))
print("Number of Image Paths Assigned:", len(df_train_sm['image_path']))
print(df_train_sm.head())
Original Training Samples: 8012
After SMOTE: 37548
Number of Image Paths Assigned: 37548
age dx_type_consensus dx_type_follow_up dx_type_histo sex_male \
0 35.0 False False True False
1 40.0 False True False True
2 65.0 False False True True
3 40.0 False True False True
4 65.0 False False True True
localization_acral localization_back localization_chest \
0 False False False
1 False False False
2 False False False
3 False False False
4 False True False
localization_ear localization_face ... \
0 False False ...
1 False False ...
2 False False ...
3 False False ...
4 False False ...
image_path_archive/HAM10000_images_part_2\ISIC_0034313.jpg \
0 False
1 False
2 False
3 False
4 False
image_path_archive/HAM10000_images_part_2\ISIC_0034314.jpg \
0 False
1 False
2 False
3 False
4 False
image_path_archive/HAM10000_images_part_2\ISIC_0034315.jpg \
0 False
1 False
2 False
3 False
4 False
image_path_archive/HAM10000_images_part_2\ISIC_0034316.jpg \
0 False
1 False
2 False
3 False
4 False
image_path_archive/HAM10000_images_part_2\ISIC_0034317.jpg \
0 False
1 False
2 False
3 False
4 False
image_path_archive/HAM10000_images_part_2\ISIC_0034318.jpg \
0 False
1 False
2 False
3 False
4 False
image_path_archive/HAM10000_images_part_2\ISIC_0034319.jpg \
0 False
1 False
2 False
3 False
4 False
image_path_archive/HAM10000_images_part_2\ISIC_0034320.jpg dx \
0 False nv
1 False nv
2 False akiec
3 False nv
4 False nv
image_path
0 archive/HAM10000_images_part_2\ISIC_0033319.jpg
1 archive/HAM10000_images_part_2\ISIC_0030823.jpg
2 archive/HAM10000_images_part_1\ISIC_0028730.jpg
3 archive/HAM10000_images_part_1\ISIC_0027299.jpg
4 archive/HAM10000_images_part_2\ISIC_0032444.jpg
[5 rows x 10034 columns]
C:\Users\anika\AppData\Local\Temp\ipykernel_14992\2288913129.py:16: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df_train_sm['dx'] = y_train_sm C:\Users\anika\AppData\Local\Temp\ipykernel_14992\2288913129.py:25: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df_train_sm['image_path'] = image_paths_extended
Now, we plot to see the dataset before and after SMOTE.
plt.figure(figsize=(10,5))
sns.countplot(y=y_train, order=y_train.value_counts().index, palette="coolwarm")
plt.title("Class Distribution Before SMOTE")
plt.xlabel("Count")
plt.ylabel("Lesion Type")
plt.show()
plt.figure(figsize=(10,5))
sns.countplot(y=y_train_sm, order=pd.Series(y_train_sm).value_counts().index, palette="coolwarm")
plt.title("Class Distribution After SMOTE")
plt.xlabel("Count")
plt.ylabel("Lesion Type")
plt.show()
C:\Users\anika\AppData\Local\Temp\ipykernel_14992\1140257420.py:2: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect. sns.countplot(y=y_train, order=y_train.value_counts().index, palette="coolwarm")
C:\Users\anika\AppData\Local\Temp\ipykernel_14992\1140257420.py:9: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect. sns.countplot(y=y_train_sm, order=pd.Series(y_train_sm).value_counts().index, palette="coolwarm")
- Data Augmentation and Normalization
ImageDataGenerator is used for real-time image preprocessing. This avoids storing modified copies of images and ensures that training data is dynamically augmented. Augmentation techniques (rotation, zoom, flipping) introduce slight variations in images to help the model generalize better. The training set includes augmentation (shuffle=True), while the validation set remains unchanged (shuffle=False) to evaluate performance on real, unaltered images.
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip=True
)
train_generator = train_datagen.flow_from_dataframe(
dataframe=df_train_sm,
directory=None,
x_col="image_path",
y_col="dx",
target_size=(224, 224),
batch_size=32,
class_mode="sparse",
shuffle=True
)
val_generator = train_datagen.flow_from_dataframe(
dataframe=df_test,
directory=None,
x_col="image_path",
y_col="dx",
target_size=(224, 224),
batch_size=32,
class_mode="sparse",
shuffle=False
)
Found 37548 validated image filenames belonging to 7 classes. Found 2003 validated image filenames belonging to 7 classes.
- Label Encoding
The labels (lesion types) are categorical strings (e.g., "mel", "bkl", "nv"). Since neural networks require numerical labels, we use LabelEncoder to convert them into integers. The num_classes variable stores the number of unique classes in the dataset, which will be used later when defining the model’s output layer.
label_encoder = LabelEncoder()
label_encoder.fit(df_train_sm['dx'])
num_classes = len(label_encoder.classes_)
🧬 Modelling¶
In this section, we define and train multiple deep learning models for skin lesion classification using the HAM10000 dataset. The models vary in complexity, using both pre-trained architectures (EfficientNet, ResNet) and custom CNN model. The key goals of this modeling phase are:
- Compare different architectures to determine which performs best.
- Leverage transfer learning to use pre-trained models for better feature extraction.
- Evaluate a custom-built CNN to see how it compares to pre-trained models.
- Use learning rate reduction and dropout layers to improve training stability and reduce overfitting.
1. Pre-Trained CNN - EfficientNetB0¶
EfficientNetB0 is lightweight convolutional neural network designed for high accuracy with fewer parameters. We use a pre-trained version of EfficientNetB0 trained on ImageNet to extract deep image features, while the classification layers are trained from scratch to adapt to our dataset. Since pre-trained models already contain learned representations of edges, textures, and patterns, we freeze the convolutional base to preserve these pre-learned features. Instead of a fully connected head, we add a global average pooling layer, followed by two dense layers (256 and 128 neurons) with dropout (0.5) to prevent overfitting. The final output layer uses softmax activation to predict the probability distribution over the seven skin lesion classes.
base_model = EfficientNetB0(weights="imagenet", include_top=False, input_shape=(224, 224, 3))
base_model.trainable = False
x = GlobalAveragePooling2D()(base_model.output)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.5)(x)
output_layer = Dense(num_classes, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=output_layer)
model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(
train_generator,
validation_data=val_generator,
epochs=20,
verbose=1
)
model.save("efficientnet_ham10000_smote.h5")
2. Custom CNN¶
To establish a baseline, we also implement a custom CNN built from scratch. This model consists of three convolutional layers with increasing filter sizes (32, 64, 128), followed by max-pooling layers to reduce spatial dimensions. The extracted features are then flattened and passed through a fully connected layer with 256 neurons and a dropout layer to reduce overfitting. Finally, the output layer, using softmax activation, classifies the input into one of the seven classes.
Unlike pre-trained models, this CNN does not benefit from any prior knowledge, so it starts learning from scratch. This experiment helps us compare the effectiveness of a hand-crafted architecture against pre-trained feature extractors.
cnn_model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(224,224,3)),
MaxPooling2D(2,2),
Conv2D(64, (3,3), activation='relu'),
MaxPooling2D(2,2),
Conv2D(128, (3,3), activation='relu'),
MaxPooling2D(2,2),
Flatten(),
Dense(256, activation='relu'),
Dropout(0.5),
Dense(len(label_encoder.classes_), activation='softmax')
])
cnn_model.compile(
optimizer=Adam(learning_rate=0.0001),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
history = cnn_model.fit(
train_generator,
validation_data=val_generator,
epochs=20,
verbose=1
)
cnn_model.save("cnn_skin_lesion_classifier_smote.h5")
Epoch 1/20 1174/1174 [==============================] - 806s 686ms/step - loss: 1.9449 - accuracy: 0.1625 - val_loss: 1.9002 - val_accuracy: 0.4104 Epoch 2/20 1174/1174 [==============================] - 800s 681ms/step - loss: 1.9400 - accuracy: 0.1685 - val_loss: 1.8491 - val_accuracy: 0.5422 Epoch 3/20 1174/1174 [==============================] - 928s 791ms/step - loss: 1.9372 - accuracy: 0.1729 - val_loss: 1.8671 - val_accuracy: 0.5617 Epoch 4/20 1174/1174 [==============================] - 810s 690ms/step - loss: 1.9353 - accuracy: 0.1742 - val_loss: 1.8425 - val_accuracy: 0.5726 Epoch 5/20 1174/1174 [==============================] - 800s 681ms/step - loss: 1.9328 - accuracy: 0.1773 - val_loss: 1.8429 - val_accuracy: 0.5647 Epoch 6/20 1174/1174 [==============================] - 799s 680ms/step - loss: 1.9317 - accuracy: 0.1832 - val_loss: 1.8466 - val_accuracy: 0.5971 Epoch 7/20 1174/1174 [==============================] - 800s 681ms/step - loss: 1.9294 - accuracy: 0.1839 - val_loss: 1.7998 - val_accuracy: 0.6435 Epoch 8/20 1174/1174 [==============================] - 802s 683ms/step - loss: 1.9280 - accuracy: 0.1843 - val_loss: 1.7920 - val_accuracy: 0.6695 Epoch 9/20 1174/1174 [==============================] - 801s 682ms/step - loss: 1.9265 - accuracy: 0.1879 - val_loss: 1.8302 - val_accuracy: 0.6186 Epoch 10/20 1174/1174 [==============================] - 800s 681ms/step - loss: 1.9258 - accuracy: 0.1878 - val_loss: 1.8372 - val_accuracy: 0.6151 Epoch 11/20 1174/1174 [==============================] - 802s 683ms/step - loss: 1.9244 - accuracy: 0.1887 - val_loss: 1.8908 - val_accuracy: 0.4823 Epoch 12/20 1174/1174 [==============================] - 801s 682ms/step - loss: 1.9242 - accuracy: 0.1879 - val_loss: 1.7820 - val_accuracy: 0.6650 Epoch 13/20 1174/1174 [==============================] - 800s 681ms/step - loss: 1.9223 - accuracy: 0.1915 - val_loss: 1.8025 - val_accuracy: 0.6520 Epoch 14/20 1174/1174 [==============================] - 799s 681ms/step - loss: 1.9212 - accuracy: 0.1910 - val_loss: 1.8119 - val_accuracy: 0.5966 Epoch 15/20 1174/1174 [==============================] - 802s 683ms/step - loss: 1.9193 - accuracy: 0.1947 - val_loss: 1.8601 - val_accuracy: 0.5856 Epoch 16/20 1174/1174 [==============================] - 879s 749ms/step - loss: 1.9192 - accuracy: 0.1954 - val_loss: 1.7930 - val_accuracy: 0.6161 Epoch 17/20 1174/1174 [==============================] - 856s 729ms/step - loss: 1.9185 - accuracy: 0.1951 - val_loss: 1.8205 - val_accuracy: 0.6390 Epoch 18/20 1174/1174 [==============================] - 799s 681ms/step - loss: 1.9173 - accuracy: 0.1970 - val_loss: 1.7848 - val_accuracy: 0.6655 Epoch 19/20 1174/1174 [==============================] - 796s 678ms/step - loss: 1.9175 - accuracy: 0.1993 - val_loss: 1.7693 - val_accuracy: 0.6805 Epoch 20/20 1174/1174 [==============================] - 781s 665ms/step - loss: 1.9158 - accuracy: 0.1982 - val_loss: 1.7678 - val_accuracy: 0.6890
3. Pre-Trained CNN - ResNet50¶
To further investigate model performance, we train a ResNet50 model from scratch without pre-trained weights. Training deep networks like ResNet from scratch can be computationally expensive and data-intensive, so we use a reduced dataset with 50 images per class to make training faster. Unlike pre-trained models, this version of ResNet50 learns features entirely from our dataset, which means it takes longer to converge and may require more data for optimal performance.
To prevent overfitting and allow the model to learn effectively, we use a learning rate reduction callback (ReduceLROnPlateau) that lowers the learning rate if validation accuracy does not improve for several epochs.
df_train_sample = df_train_sm.groupby('dx').apply(lambda x: x.sample(n=50, replace=False)).reset_index(drop=True)
train_generator_sample = train_datagen.flow_from_dataframe(
dataframe=df_train_sample,
directory=None,
x_col="image_path",
y_col="dx",
target_size=(224, 224),
batch_size=32,
class_mode="sparse",
shuffle=True
)
Found 350 validated image filenames belonging to 7 classes.
C:\Users\anika\AppData\Local\Temp\ipykernel_14240\1333295179.py:1: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
df_train_sample = df_train_sm.groupby('dx').apply(lambda x: x.sample(n=50, replace=False)).reset_index(drop=True)
learning_rate_reduction = ReduceLROnPlateau(monitor='val_accuracy', patience=5, verbose=1, factor=0.5, min_lr=1e-7)
input_shape = (224,224,3)
lr = 1e-5
epochs = 20
batch_size = 64
model = ResNet50(include_top=True,
weights= None,
input_tensor=None,
input_shape=input_shape,
pooling='avg',
classes=7)
model.compile(optimizer = Adam(lr) ,
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
history = model.fit(train_generator_sample, validation_data=val_generator,
epochs= epochs, batch_size= batch_size, verbose=2,
callbacks=[learning_rate_reduction]
)
model.save("resnet50_sampled_ham10000.h5")
Pre-Trained CNN - ResNet50V2¶
Finally, we experiment with ResNet50V2, a more advanced version of ResNet50 and this time we use all of our data. Unlike the previous ResNet model, this version uses pre-trained ImageNet weights and is employed as a feature extractor, meaning its convolutional layers are frozen during initial training. The extracted features are then processed through a global average pooling layer, followed by fully connected layers with dropout to prevent overfitting.
By using a low learning rate (0.0001) and dropout layers (0.5 probability), the model generalizes better to unseen data. Since ResNet50V2 has a more optimized architecture than ResNet50, it typically achieves better results when used for transfer learning.
base_model = ResNet50V2(weights="imagenet", include_top=False, input_shape=(224, 224, 3))
base_model.trainable = False
x = GlobalAveragePooling2D()(base_model.output)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.5)(x)
output_layer = Dense(num_classes, activation='softmax')(x)
resnet_model = Model(inputs=base_model.input, outputs=output_layer)
resnet_model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
resnet_model.summary()
history_resnet = resnet_model.fit(
train_generator,
validation_data=val_generator,
epochs=20,
verbose=1
)
resnet_model.save("resnet50v2_ham10000_smote.h5")
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50v2_weights_tf_dim_ordering_tf_kernels_notop.h5
94668760/94668760 [==============================] - 11s 0us/step
Model: "model_14"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_6 (InputLayer) [(None, 224, 224, 3)] 0 []
conv1_pad (ZeroPadding2D) (None, 230, 230, 3) 0 ['input_6[0][0]']
conv1_conv (Conv2D) (None, 112, 112, 64) 9472 ['conv1_pad[0][0]']
pool1_pad (ZeroPadding2D) (None, 114, 114, 64) 0 ['conv1_conv[0][0]']
pool1_pool (MaxPooling2D) (None, 56, 56, 64) 0 ['pool1_pad[0][0]']
conv2_block1_preact_bn (Ba (None, 56, 56, 64) 256 ['pool1_pool[0][0]']
tchNormalization)
conv2_block1_preact_relu ( (None, 56, 56, 64) 0 ['conv2_block1_preact_bn[0][0]
Activation) ']
conv2_block1_1_conv (Conv2 (None, 56, 56, 64) 4096 ['conv2_block1_preact_relu[0][
D) 0]']
conv2_block1_1_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block1_1_conv[0][0]']
rmalization)
conv2_block1_1_relu (Activ (None, 56, 56, 64) 0 ['conv2_block1_1_bn[0][0]']
ation)
conv2_block1_2_pad (ZeroPa (None, 58, 58, 64) 0 ['conv2_block1_1_relu[0][0]']
dding2D)
conv2_block1_2_conv (Conv2 (None, 56, 56, 64) 36864 ['conv2_block1_2_pad[0][0]']
D)
conv2_block1_2_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block1_2_conv[0][0]']
rmalization)
conv2_block1_2_relu (Activ (None, 56, 56, 64) 0 ['conv2_block1_2_bn[0][0]']
ation)
conv2_block1_0_conv (Conv2 (None, 56, 56, 256) 16640 ['conv2_block1_preact_relu[0][
D) 0]']
conv2_block1_3_conv (Conv2 (None, 56, 56, 256) 16640 ['conv2_block1_2_relu[0][0]']
D)
conv2_block1_out (Add) (None, 56, 56, 256) 0 ['conv2_block1_0_conv[0][0]',
'conv2_block1_3_conv[0][0]']
conv2_block2_preact_bn (Ba (None, 56, 56, 256) 1024 ['conv2_block1_out[0][0]']
tchNormalization)
conv2_block2_preact_relu ( (None, 56, 56, 256) 0 ['conv2_block2_preact_bn[0][0]
Activation) ']
conv2_block2_1_conv (Conv2 (None, 56, 56, 64) 16384 ['conv2_block2_preact_relu[0][
D) 0]']
conv2_block2_1_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block2_1_conv[0][0]']
rmalization)
conv2_block2_1_relu (Activ (None, 56, 56, 64) 0 ['conv2_block2_1_bn[0][0]']
ation)
conv2_block2_2_pad (ZeroPa (None, 58, 58, 64) 0 ['conv2_block2_1_relu[0][0]']
dding2D)
conv2_block2_2_conv (Conv2 (None, 56, 56, 64) 36864 ['conv2_block2_2_pad[0][0]']
D)
conv2_block2_2_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block2_2_conv[0][0]']
rmalization)
conv2_block2_2_relu (Activ (None, 56, 56, 64) 0 ['conv2_block2_2_bn[0][0]']
ation)
conv2_block2_3_conv (Conv2 (None, 56, 56, 256) 16640 ['conv2_block2_2_relu[0][0]']
D)
conv2_block2_out (Add) (None, 56, 56, 256) 0 ['conv2_block1_out[0][0]',
'conv2_block2_3_conv[0][0]']
conv2_block3_preact_bn (Ba (None, 56, 56, 256) 1024 ['conv2_block2_out[0][0]']
tchNormalization)
conv2_block3_preact_relu ( (None, 56, 56, 256) 0 ['conv2_block3_preact_bn[0][0]
Activation) ']
conv2_block3_1_conv (Conv2 (None, 56, 56, 64) 16384 ['conv2_block3_preact_relu[0][
D) 0]']
conv2_block3_1_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block3_1_conv[0][0]']
rmalization)
conv2_block3_1_relu (Activ (None, 56, 56, 64) 0 ['conv2_block3_1_bn[0][0]']
ation)
conv2_block3_2_pad (ZeroPa (None, 58, 58, 64) 0 ['conv2_block3_1_relu[0][0]']
dding2D)
conv2_block3_2_conv (Conv2 (None, 28, 28, 64) 36864 ['conv2_block3_2_pad[0][0]']
D)
conv2_block3_2_bn (BatchNo (None, 28, 28, 64) 256 ['conv2_block3_2_conv[0][0]']
rmalization)
conv2_block3_2_relu (Activ (None, 28, 28, 64) 0 ['conv2_block3_2_bn[0][0]']
ation)
max_pooling2d_6 (MaxPoolin (None, 28, 28, 256) 0 ['conv2_block2_out[0][0]']
g2D)
conv2_block3_3_conv (Conv2 (None, 28, 28, 256) 16640 ['conv2_block3_2_relu[0][0]']
D)
conv2_block3_out (Add) (None, 28, 28, 256) 0 ['max_pooling2d_6[0][0]',
'conv2_block3_3_conv[0][0]']
conv3_block1_preact_bn (Ba (None, 28, 28, 256) 1024 ['conv2_block3_out[0][0]']
tchNormalization)
conv3_block1_preact_relu ( (None, 28, 28, 256) 0 ['conv3_block1_preact_bn[0][0]
Activation) ']
conv3_block1_1_conv (Conv2 (None, 28, 28, 128) 32768 ['conv3_block1_preact_relu[0][
D) 0]']
conv3_block1_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block1_1_conv[0][0]']
rmalization)
conv3_block1_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block1_1_bn[0][0]']
ation)
conv3_block1_2_pad (ZeroPa (None, 30, 30, 128) 0 ['conv3_block1_1_relu[0][0]']
dding2D)
conv3_block1_2_conv (Conv2 (None, 28, 28, 128) 147456 ['conv3_block1_2_pad[0][0]']
D)
conv3_block1_2_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block1_2_conv[0][0]']
rmalization)
conv3_block1_2_relu (Activ (None, 28, 28, 128) 0 ['conv3_block1_2_bn[0][0]']
ation)
conv3_block1_0_conv (Conv2 (None, 28, 28, 512) 131584 ['conv3_block1_preact_relu[0][
D) 0]']
conv3_block1_3_conv (Conv2 (None, 28, 28, 512) 66048 ['conv3_block1_2_relu[0][0]']
D)
conv3_block1_out (Add) (None, 28, 28, 512) 0 ['conv3_block1_0_conv[0][0]',
'conv3_block1_3_conv[0][0]']
conv3_block2_preact_bn (Ba (None, 28, 28, 512) 2048 ['conv3_block1_out[0][0]']
tchNormalization)
conv3_block2_preact_relu ( (None, 28, 28, 512) 0 ['conv3_block2_preact_bn[0][0]
Activation) ']
conv3_block2_1_conv (Conv2 (None, 28, 28, 128) 65536 ['conv3_block2_preact_relu[0][
D) 0]']
conv3_block2_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block2_1_conv[0][0]']
rmalization)
conv3_block2_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block2_1_bn[0][0]']
ation)
conv3_block2_2_pad (ZeroPa (None, 30, 30, 128) 0 ['conv3_block2_1_relu[0][0]']
dding2D)
conv3_block2_2_conv (Conv2 (None, 28, 28, 128) 147456 ['conv3_block2_2_pad[0][0]']
D)
conv3_block2_2_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block2_2_conv[0][0]']
rmalization)
conv3_block2_2_relu (Activ (None, 28, 28, 128) 0 ['conv3_block2_2_bn[0][0]']
ation)
conv3_block2_3_conv (Conv2 (None, 28, 28, 512) 66048 ['conv3_block2_2_relu[0][0]']
D)
conv3_block2_out (Add) (None, 28, 28, 512) 0 ['conv3_block1_out[0][0]',
'conv3_block2_3_conv[0][0]']
conv3_block3_preact_bn (Ba (None, 28, 28, 512) 2048 ['conv3_block2_out[0][0]']
tchNormalization)
conv3_block3_preact_relu ( (None, 28, 28, 512) 0 ['conv3_block3_preact_bn[0][0]
Activation) ']
conv3_block3_1_conv (Conv2 (None, 28, 28, 128) 65536 ['conv3_block3_preact_relu[0][
D) 0]']
conv3_block3_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block3_1_conv[0][0]']
rmalization)
conv3_block3_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block3_1_bn[0][0]']
ation)
conv3_block3_2_pad (ZeroPa (None, 30, 30, 128) 0 ['conv3_block3_1_relu[0][0]']
dding2D)
conv3_block3_2_conv (Conv2 (None, 28, 28, 128) 147456 ['conv3_block3_2_pad[0][0]']
D)
conv3_block3_2_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block3_2_conv[0][0]']
rmalization)
conv3_block3_2_relu (Activ (None, 28, 28, 128) 0 ['conv3_block3_2_bn[0][0]']
ation)
conv3_block3_3_conv (Conv2 (None, 28, 28, 512) 66048 ['conv3_block3_2_relu[0][0]']
D)
conv3_block3_out (Add) (None, 28, 28, 512) 0 ['conv3_block2_out[0][0]',
'conv3_block3_3_conv[0][0]']
conv3_block4_preact_bn (Ba (None, 28, 28, 512) 2048 ['conv3_block3_out[0][0]']
tchNormalization)
conv3_block4_preact_relu ( (None, 28, 28, 512) 0 ['conv3_block4_preact_bn[0][0]
Activation) ']
conv3_block4_1_conv (Conv2 (None, 28, 28, 128) 65536 ['conv3_block4_preact_relu[0][
D) 0]']
conv3_block4_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block4_1_conv[0][0]']
rmalization)
conv3_block4_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block4_1_bn[0][0]']
ation)
conv3_block4_2_pad (ZeroPa (None, 30, 30, 128) 0 ['conv3_block4_1_relu[0][0]']
dding2D)
conv3_block4_2_conv (Conv2 (None, 14, 14, 128) 147456 ['conv3_block4_2_pad[0][0]']
D)
conv3_block4_2_bn (BatchNo (None, 14, 14, 128) 512 ['conv3_block4_2_conv[0][0]']
rmalization)
conv3_block4_2_relu (Activ (None, 14, 14, 128) 0 ['conv3_block4_2_bn[0][0]']
ation)
max_pooling2d_7 (MaxPoolin (None, 14, 14, 512) 0 ['conv3_block3_out[0][0]']
g2D)
conv3_block4_3_conv (Conv2 (None, 14, 14, 512) 66048 ['conv3_block4_2_relu[0][0]']
D)
conv3_block4_out (Add) (None, 14, 14, 512) 0 ['max_pooling2d_7[0][0]',
'conv3_block4_3_conv[0][0]']
conv4_block1_preact_bn (Ba (None, 14, 14, 512) 2048 ['conv3_block4_out[0][0]']
tchNormalization)
conv4_block1_preact_relu ( (None, 14, 14, 512) 0 ['conv4_block1_preact_bn[0][0]
Activation) ']
conv4_block1_1_conv (Conv2 (None, 14, 14, 256) 131072 ['conv4_block1_preact_relu[0][
D) 0]']
conv4_block1_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block1_1_conv[0][0]']
rmalization)
conv4_block1_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block1_1_bn[0][0]']
ation)
conv4_block1_2_pad (ZeroPa (None, 16, 16, 256) 0 ['conv4_block1_1_relu[0][0]']
dding2D)
conv4_block1_2_conv (Conv2 (None, 14, 14, 256) 589824 ['conv4_block1_2_pad[0][0]']
D)
conv4_block1_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block1_2_conv[0][0]']
rmalization)
conv4_block1_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block1_2_bn[0][0]']
ation)
conv4_block1_0_conv (Conv2 (None, 14, 14, 1024) 525312 ['conv4_block1_preact_relu[0][
D) 0]']
conv4_block1_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block1_2_relu[0][0]']
D)
conv4_block1_out (Add) (None, 14, 14, 1024) 0 ['conv4_block1_0_conv[0][0]',
'conv4_block1_3_conv[0][0]']
conv4_block2_preact_bn (Ba (None, 14, 14, 1024) 4096 ['conv4_block1_out[0][0]']
tchNormalization)
conv4_block2_preact_relu ( (None, 14, 14, 1024) 0 ['conv4_block2_preact_bn[0][0]
Activation) ']
conv4_block2_1_conv (Conv2 (None, 14, 14, 256) 262144 ['conv4_block2_preact_relu[0][
D) 0]']
conv4_block2_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block2_1_conv[0][0]']
rmalization)
conv4_block2_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block2_1_bn[0][0]']
ation)
conv4_block2_2_pad (ZeroPa (None, 16, 16, 256) 0 ['conv4_block2_1_relu[0][0]']
dding2D)
conv4_block2_2_conv (Conv2 (None, 14, 14, 256) 589824 ['conv4_block2_2_pad[0][0]']
D)
conv4_block2_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block2_2_conv[0][0]']
rmalization)
conv4_block2_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block2_2_bn[0][0]']
ation)
conv4_block2_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block2_2_relu[0][0]']
D)
conv4_block2_out (Add) (None, 14, 14, 1024) 0 ['conv4_block1_out[0][0]',
'conv4_block2_3_conv[0][0]']
conv4_block3_preact_bn (Ba (None, 14, 14, 1024) 4096 ['conv4_block2_out[0][0]']
tchNormalization)
conv4_block3_preact_relu ( (None, 14, 14, 1024) 0 ['conv4_block3_preact_bn[0][0]
Activation) ']
conv4_block3_1_conv (Conv2 (None, 14, 14, 256) 262144 ['conv4_block3_preact_relu[0][
D) 0]']
conv4_block3_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block3_1_conv[0][0]']
rmalization)
conv4_block3_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block3_1_bn[0][0]']
ation)
conv4_block3_2_pad (ZeroPa (None, 16, 16, 256) 0 ['conv4_block3_1_relu[0][0]']
dding2D)
conv4_block3_2_conv (Conv2 (None, 14, 14, 256) 589824 ['conv4_block3_2_pad[0][0]']
D)
conv4_block3_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block3_2_conv[0][0]']
rmalization)
conv4_block3_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block3_2_bn[0][0]']
ation)
conv4_block3_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block3_2_relu[0][0]']
D)
conv4_block3_out (Add) (None, 14, 14, 1024) 0 ['conv4_block2_out[0][0]',
'conv4_block3_3_conv[0][0]']
conv4_block4_preact_bn (Ba (None, 14, 14, 1024) 4096 ['conv4_block3_out[0][0]']
tchNormalization)
conv4_block4_preact_relu ( (None, 14, 14, 1024) 0 ['conv4_block4_preact_bn[0][0]
Activation) ']
conv4_block4_1_conv (Conv2 (None, 14, 14, 256) 262144 ['conv4_block4_preact_relu[0][
D) 0]']
conv4_block4_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block4_1_conv[0][0]']
rmalization)
conv4_block4_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block4_1_bn[0][0]']
ation)
conv4_block4_2_pad (ZeroPa (None, 16, 16, 256) 0 ['conv4_block4_1_relu[0][0]']
dding2D)
conv4_block4_2_conv (Conv2 (None, 14, 14, 256) 589824 ['conv4_block4_2_pad[0][0]']
D)
conv4_block4_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block4_2_conv[0][0]']
rmalization)
conv4_block4_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block4_2_bn[0][0]']
ation)
conv4_block4_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block4_2_relu[0][0]']
D)
conv4_block4_out (Add) (None, 14, 14, 1024) 0 ['conv4_block3_out[0][0]',
'conv4_block4_3_conv[0][0]']
conv4_block5_preact_bn (Ba (None, 14, 14, 1024) 4096 ['conv4_block4_out[0][0]']
tchNormalization)
conv4_block5_preact_relu ( (None, 14, 14, 1024) 0 ['conv4_block5_preact_bn[0][0]
Activation) ']
conv4_block5_1_conv (Conv2 (None, 14, 14, 256) 262144 ['conv4_block5_preact_relu[0][
D) 0]']
conv4_block5_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block5_1_conv[0][0]']
rmalization)
conv4_block5_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block5_1_bn[0][0]']
ation)
conv4_block5_2_pad (ZeroPa (None, 16, 16, 256) 0 ['conv4_block5_1_relu[0][0]']
dding2D)
conv4_block5_2_conv (Conv2 (None, 14, 14, 256) 589824 ['conv4_block5_2_pad[0][0]']
D)
conv4_block5_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block5_2_conv[0][0]']
rmalization)
conv4_block5_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block5_2_bn[0][0]']
ation)
conv4_block5_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block5_2_relu[0][0]']
D)
conv4_block5_out (Add) (None, 14, 14, 1024) 0 ['conv4_block4_out[0][0]',
'conv4_block5_3_conv[0][0]']
conv4_block6_preact_bn (Ba (None, 14, 14, 1024) 4096 ['conv4_block5_out[0][0]']
tchNormalization)
conv4_block6_preact_relu ( (None, 14, 14, 1024) 0 ['conv4_block6_preact_bn[0][0]
Activation) ']
conv4_block6_1_conv (Conv2 (None, 14, 14, 256) 262144 ['conv4_block6_preact_relu[0][
D) 0]']
conv4_block6_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block6_1_conv[0][0]']
rmalization)
conv4_block6_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block6_1_bn[0][0]']
ation)
conv4_block6_2_pad (ZeroPa (None, 16, 16, 256) 0 ['conv4_block6_1_relu[0][0]']
dding2D)
conv4_block6_2_conv (Conv2 (None, 7, 7, 256) 589824 ['conv4_block6_2_pad[0][0]']
D)
conv4_block6_2_bn (BatchNo (None, 7, 7, 256) 1024 ['conv4_block6_2_conv[0][0]']
rmalization)
conv4_block6_2_relu (Activ (None, 7, 7, 256) 0 ['conv4_block6_2_bn[0][0]']
ation)
max_pooling2d_8 (MaxPoolin (None, 7, 7, 1024) 0 ['conv4_block5_out[0][0]']
g2D)
conv4_block6_3_conv (Conv2 (None, 7, 7, 1024) 263168 ['conv4_block6_2_relu[0][0]']
D)
conv4_block6_out (Add) (None, 7, 7, 1024) 0 ['max_pooling2d_8[0][0]',
'conv4_block6_3_conv[0][0]']
conv5_block1_preact_bn (Ba (None, 7, 7, 1024) 4096 ['conv4_block6_out[0][0]']
tchNormalization)
conv5_block1_preact_relu ( (None, 7, 7, 1024) 0 ['conv5_block1_preact_bn[0][0]
Activation) ']
conv5_block1_1_conv (Conv2 (None, 7, 7, 512) 524288 ['conv5_block1_preact_relu[0][
D) 0]']
conv5_block1_1_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block1_1_conv[0][0]']
rmalization)
conv5_block1_1_relu (Activ (None, 7, 7, 512) 0 ['conv5_block1_1_bn[0][0]']
ation)
conv5_block1_2_pad (ZeroPa (None, 9, 9, 512) 0 ['conv5_block1_1_relu[0][0]']
dding2D)
conv5_block1_2_conv (Conv2 (None, 7, 7, 512) 2359296 ['conv5_block1_2_pad[0][0]']
D)
conv5_block1_2_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block1_2_conv[0][0]']
rmalization)
conv5_block1_2_relu (Activ (None, 7, 7, 512) 0 ['conv5_block1_2_bn[0][0]']
ation)
conv5_block1_0_conv (Conv2 (None, 7, 7, 2048) 2099200 ['conv5_block1_preact_relu[0][
D) 0]']
conv5_block1_3_conv (Conv2 (None, 7, 7, 2048) 1050624 ['conv5_block1_2_relu[0][0]']
D)
conv5_block1_out (Add) (None, 7, 7, 2048) 0 ['conv5_block1_0_conv[0][0]',
'conv5_block1_3_conv[0][0]']
conv5_block2_preact_bn (Ba (None, 7, 7, 2048) 8192 ['conv5_block1_out[0][0]']
tchNormalization)
conv5_block2_preact_relu ( (None, 7, 7, 2048) 0 ['conv5_block2_preact_bn[0][0]
Activation) ']
conv5_block2_1_conv (Conv2 (None, 7, 7, 512) 1048576 ['conv5_block2_preact_relu[0][
D) 0]']
conv5_block2_1_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block2_1_conv[0][0]']
rmalization)
conv5_block2_1_relu (Activ (None, 7, 7, 512) 0 ['conv5_block2_1_bn[0][0]']
ation)
conv5_block2_2_pad (ZeroPa (None, 9, 9, 512) 0 ['conv5_block2_1_relu[0][0]']
dding2D)
conv5_block2_2_conv (Conv2 (None, 7, 7, 512) 2359296 ['conv5_block2_2_pad[0][0]']
D)
conv5_block2_2_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block2_2_conv[0][0]']
rmalization)
conv5_block2_2_relu (Activ (None, 7, 7, 512) 0 ['conv5_block2_2_bn[0][0]']
ation)
conv5_block2_3_conv (Conv2 (None, 7, 7, 2048) 1050624 ['conv5_block2_2_relu[0][0]']
D)
conv5_block2_out (Add) (None, 7, 7, 2048) 0 ['conv5_block1_out[0][0]',
'conv5_block2_3_conv[0][0]']
conv5_block3_preact_bn (Ba (None, 7, 7, 2048) 8192 ['conv5_block2_out[0][0]']
tchNormalization)
conv5_block3_preact_relu ( (None, 7, 7, 2048) 0 ['conv5_block3_preact_bn[0][0]
Activation) ']
conv5_block3_1_conv (Conv2 (None, 7, 7, 512) 1048576 ['conv5_block3_preact_relu[0][
D) 0]']
conv5_block3_1_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block3_1_conv[0][0]']
rmalization)
conv5_block3_1_relu (Activ (None, 7, 7, 512) 0 ['conv5_block3_1_bn[0][0]']
ation)
conv5_block3_2_pad (ZeroPa (None, 9, 9, 512) 0 ['conv5_block3_1_relu[0][0]']
dding2D)
conv5_block3_2_conv (Conv2 (None, 7, 7, 512) 2359296 ['conv5_block3_2_pad[0][0]']
D)
conv5_block3_2_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block3_2_conv[0][0]']
rmalization)
conv5_block3_2_relu (Activ (None, 7, 7, 512) 0 ['conv5_block3_2_bn[0][0]']
ation)
conv5_block3_3_conv (Conv2 (None, 7, 7, 2048) 1050624 ['conv5_block3_2_relu[0][0]']
D)
conv5_block3_out (Add) (None, 7, 7, 2048) 0 ['conv5_block2_out[0][0]',
'conv5_block3_3_conv[0][0]']
post_bn (BatchNormalizatio (None, 7, 7, 2048) 8192 ['conv5_block3_out[0][0]']
n)
post_relu (Activation) (None, 7, 7, 2048) 0 ['post_bn[0][0]']
global_average_pooling2d_4 (None, 2048) 0 ['post_relu[0][0]']
(GlobalAveragePooling2D)
dense_16 (Dense) (None, 256) 524544 ['global_average_pooling2d_4[0
][0]']
dropout_10 (Dropout) (None, 256) 0 ['dense_16[0][0]']
dense_17 (Dense) (None, 128) 32896 ['dropout_10[0][0]']
dropout_11 (Dropout) (None, 128) 0 ['dense_17[0][0]']
dense_18 (Dense) (None, 7) 903 ['dropout_11[0][0]']
==================================================================================================
Total params: 24123143 (92.02 MB)
Trainable params: 558343 (2.13 MB)
Non-trainable params: 23564800 (89.89 MB)
__________________________________________________________________________________________________
Epoch 1/20
1174/1174 [==============================] - 1757s 1s/step - loss: 2.0065 - accuracy: 0.1496 - val_loss: 1.9346 - val_accuracy: 0.2476
Epoch 2/20
1174/1174 [==============================] - 1147s 977ms/step - loss: 1.9480 - accuracy: 0.1491 - val_loss: 1.9269 - val_accuracy: 0.3295
Epoch 3/20
1174/1174 [==============================] - 1024s 872ms/step - loss: 1.9429 - accuracy: 0.1600 - val_loss: 1.9181 - val_accuracy: 0.4508
Epoch 4/20
1174/1174 [==============================] - 1088s 926ms/step - loss: 1.9389 - accuracy: 0.1648 - val_loss: 1.9229 - val_accuracy: 0.5037
Epoch 5/20
1174/1174 [==============================] - 1080s 919ms/step - loss: 1.9358 - accuracy: 0.1700 - val_loss: 1.8429 - val_accuracy: 0.5771
Epoch 6/20
1174/1174 [==============================] - 1075s 916ms/step - loss: 1.9348 - accuracy: 0.1731 - val_loss: 1.8923 - val_accuracy: 0.5567
Epoch 7/20
1174/1174 [==============================] - 1075s 916ms/step - loss: 1.9329 - accuracy: 0.1778 - val_loss: 1.8891 - val_accuracy: 0.5946
Epoch 8/20
1174/1174 [==============================] - 1078s 918ms/step - loss: 1.9296 - accuracy: 0.1800 - val_loss: 1.8600 - val_accuracy: 0.6176
Epoch 9/20
1174/1174 [==============================] - 1073s 913ms/step - loss: 1.9293 - accuracy: 0.1771 - val_loss: 1.8336 - val_accuracy: 0.6520
Epoch 10/20
1174/1174 [==============================] - 1076s 917ms/step - loss: 1.9272 - accuracy: 0.1810 - val_loss: 1.8446 - val_accuracy: 0.6515
Epoch 11/20
1174/1174 [==============================] - 1073s 914ms/step - loss: 1.9266 - accuracy: 0.1814 - val_loss: 1.8293 - val_accuracy: 0.6630
Epoch 12/20
1174/1174 [==============================] - 1180s 1s/step - loss: 1.9257 - accuracy: 0.1837 - val_loss: 1.8427 - val_accuracy: 0.6580
Epoch 13/20
1174/1174 [==============================] - 1108s 944ms/step - loss: 1.9239 - accuracy: 0.1868 - val_loss: 1.8470 - val_accuracy: 0.6326
Epoch 14/20
1174/1174 [==============================] - 1078s 918ms/step - loss: 1.9238 - accuracy: 0.1870 - val_loss: 1.8522 - val_accuracy: 0.6365
Epoch 15/20
1174/1174 [==============================] - 1080s 920ms/step - loss: 1.9231 - accuracy: 0.1864 - val_loss: 1.8184 - val_accuracy: 0.6600
Epoch 16/20
1174/1174 [==============================] - 1081s 921ms/step - loss: 1.9232 - accuracy: 0.1858 - val_loss: 1.8114 - val_accuracy: 0.6850
Epoch 17/20
1174/1174 [==============================] - 1077s 918ms/step - loss: 1.9228 - accuracy: 0.1887 - val_loss: 1.8162 - val_accuracy: 0.6700
Epoch 18/20
1174/1174 [==============================] - 1081s 920ms/step - loss: 1.9214 - accuracy: 0.1902 - val_loss: 1.7914 - val_accuracy: 0.6990
Epoch 19/20
1174/1174 [==============================] - 1086s 924ms/step - loss: 1.9215 - accuracy: 0.1890 - val_loss: 1.8029 - val_accuracy: 0.6815
Epoch 20/20
1174/1174 [==============================] - 1083s 923ms/step - loss: 1.9214 - accuracy: 0.1901 - val_loss: 1.8174 - val_accuracy: 0.6780
C:\Users\anika\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\keras\src\engine\training.py:3103: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
saving_api.save_model(
4. Hyperparameter Tuning¶
Hyperparameter tuning is a crucial step in optimizing model performance by adjusting parameters such as learning rate, number of layers, dropout rates, and batch size. In this project, we experimented with different model variations, as already shown in the previous sections. We trained both pre-trained models (EfficientNetB0, ResNet50V2) and custom-built models (CNN, ResNet50 from scratch) to compare their effectiveness. To improve training stability and generalization, we applied learning rate reduction techniques (ReduceLROnPlateau), dropout layers, and different training durations.
🧐 Evaluation¶
After training multiple models, we need to evaluate their performance on the validation/test set. The evaluation process involves:
- Loading the trained models from their saved .h5 files.
- Measuring key performance metrics such as:
- Validation accuracy: The percentage of correctly classified images.
- Loss value: How well the model's predictions match the ground truth.
- Confusion Matrix: A breakdown of correct and incorrect predictions per class.
- Classification Report: Precision, recall, and F1-score for each class.
1. Load Each Trained Model¶
efficientnet_model = load_model("efficientnet_ham10000_smote.h5")
cnn_model = load_model("cnn_skin_lesion_classifier_smote.h5")
resnet50_sampled_model = load_model("resnet50_sampled_ham10000.h5")
resnet50v2_model = load_model("resnet50v2_ham10000_smote.h5")
2. Evaluate Models on the Validation Set¶
efficientnet_results = efficientnet_model.evaluate(val_generator, verbose=1)
cnn_results = cnn_model.evaluate(val_generator, verbose=1)
resnet50_sampled_results = resnet50_sampled_model.evaluate(val_generator, verbose=1)
resnet50v2_results = resnet50v2_model.evaluate(val_generator, verbose=1)
print(f"EfficientNetB0 - Loss: {efficientnet_results[0]:.4f}, Accuracy: {efficientnet_results[1]:.4f}")
print(f"Custom CNN - Loss: {cnn_results[0]:.4f}, Accuracy: {cnn_results[1]:.4f}")
print(f"ResNet50 (Small Dataset) - Loss: {resnet50_sampled_results[0]:.4f}, Accuracy: {resnet50_sampled_results[1]:.4f}")
print(f"ResNet50V2 - Loss: {resnet50v2_results[0]:.4f}, Accuracy: {resnet50v2_results[1]:.4f}")
63/63 [==============================] - 39s 601ms/step - loss: 1.9461 - accuracy: 0.1098 63/63 [==============================] - 21s 325ms/step - loss: 1.7683 - accuracy: 0.6855 63/63 [==============================] - 78s 1s/step - loss: 1.4975 - accuracy: 0.6695 63/63 [==============================] - 58s 903ms/step - loss: 1.8204 - accuracy: 0.6575 EfficientNetB0 - Loss: 1.9461, Accuracy: 0.1098 Custom CNN - Loss: 1.7683, Accuracy: 0.6855 ResNet50 (Small Dataset) - Loss: 1.4975, Accuracy: 0.6695 ResNet50V2 - Loss: 1.8204, Accuracy: 0.6575
EfficientNetB0 surprisingly shows the lowest validation accuracy at 10.98%, which suggests that despite being a powerful pre-trained feature extractor, it might not be well-suited for this particular dataset or may require further fine-tuning. One reason could be that EfficientNetB0 is designed for natural images (e.g., ImageNet), which differ significantly from medical images in structure and detail.
The custom CNN performs unexpectedly well, achieving 68.55% accuracy, which suggests that a simpler model trained from scratch may better adapt to the dataset than pre-trained networks. This could be because pre-trained models rely on high-level ImageNet features that may not transfer well to skin lesion classification, whereas a smaller, custom CNN learns dataset-specific patterns from the start.
The ResNet50 trained on a small dataset reaches 66.95% accuracy, which is similar to the custom CNN. Despite training on a limited number of images, it still generalizes reasonably well, suggesting that even a subset of real data can be informative. Adding even more data could lead to better results, but due to time limitations and the long training of this model, it cannot be done.
Lastly, ResNet50V2 reaches 65.75% accuracy, slightly lower than the other models but still within the same range. This suggests that while deep feature extraction helps, the model may still need further fine-tuning, especially since it was trained with frozen convolutional layers. But again, due to time limitations, further fine-tuning of this model was out-of-scope.
Overall, considering the validation data consists of real images while the training set includes many synthetic samples, the models are performing reasonably well. Additionally, our dataset consists of seven distinct classes, which is significantly more complex than a simple benign vs. malignant classification.
3. Generate Predictions for Confusion Matrix & Classification Report¶
y_true = val_generator.classes
y_pred_efficientnet = np.argmax(efficientnet_model.predict(val_generator), axis=1)
y_pred_cnn = np.argmax(cnn_model.predict(val_generator), axis=1)
y_pred_resnet50_sampled = np.argmax(resnet50_sampled_model.predict(val_generator), axis=1)
y_pred_resnet50v2 = np.argmax(resnet50v2_model.predict(val_generator), axis=1)
cm_efficientnet = confusion_matrix(y_true, y_pred_efficientnet)
cm_cnn = confusion_matrix(y_true, y_pred_cnn)
cm_resnet50_sampled = confusion_matrix(y_true, y_pred_resnet50_sampled)
cm_resnet50v2 = confusion_matrix(y_true, y_pred_resnet50v2)
def plot_confusion_matrix(cm, title):
plt.figure(figsize=(8,6))
sns.heatmap(cm, annot=True, fmt='d', cmap="Blues", xticklabels=val_generator.class_indices, yticklabels=val_generator.class_indices)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title(title)
plt.show()
plot_confusion_matrix(cm_efficientnet, "EfficientNetB0 - Confusion Matrix")
plot_confusion_matrix(cm_cnn, "Custom CNN - Confusion Matrix")
plot_confusion_matrix(cm_resnet50_sampled, "ResNet50 (Small Dataset) - Confusion Matrix")
plot_confusion_matrix(cm_resnet50v2, "ResNet50V2 - Confusion Matrix")
63/63 [==============================] - 40s 625ms/step 63/63 [==============================] - 22s 348ms/step 63/63 [==============================] - 83s 1s/step 63/63 [==============================] - 57s 897ms/step
All models show biases towards "nv". This is likely because, although the dataset is balanced with SMOTE, "nv" contains the most real images in the dataset. "nv" was the dominant class befor the balancing and thus contains real images. Models tend to struggle to generalize synthetic data, meaning that even though SMOTE balanced the dataset, it didn't fully capture the variability needed. Custom CNN and ResNet50V2 show better differentiation but still exhibit class bias. It is important to note that pre-trained models require a good mix between real and synthetic data and in this case we did not have it. Observing other notebooks in Kaggle, it was noticed that with no so complex models the results are similar to these in this notebook. Classifying the skin cancer as just malignant or benign tends to have better results.
Further improvement steps could be trying different balancing techniques like SMOTE-Tomek and SMOTE-ENN.
3. Classification Report (Precision, Recall, F1-Score)¶
Now, we are going to compare the classification reports of the 2 models that performed the best, although still not good.
print("Custom CNN Classification Report:\n", classification_report(y_true, y_pred_cnn, target_names=val_generator.class_indices.keys()))
Custom CNN Classification Report:
precision recall f1-score support
akiec 0.25 0.51 0.33 65
bcc 0.38 0.63 0.48 103
bkl 0.46 0.39 0.42 220
df 0.00 0.00 0.00 23
mel 0.38 0.45 0.41 223
nv 0.91 0.80 0.85 1341
vasc 0.27 0.57 0.36 28
accuracy 0.69 2003
macro avg 0.38 0.48 0.41 2003
weighted avg 0.73 0.69 0.70 2003
print("ResNet50V2 Classification Report:\n", classification_report(y_true, y_pred_resnet50v2, target_names=val_generator.class_indices.keys()))
ResNet50V2 Classification Report:
precision recall f1-score support
akiec 0.13 0.06 0.08 65
bcc 0.27 0.64 0.38 103
bkl 0.39 0.47 0.43 220
df 0.09 0.04 0.06 23
mel 0.35 0.47 0.40 223
nv 0.94 0.78 0.85 1341
vasc 0.32 0.36 0.34 28
accuracy 0.67 2003
macro avg 0.36 0.40 0.36 2003
weighted avg 0.73 0.67 0.69 2003
4. Correct and Missclassified Images¶
We can see that most of the images are of "nv". It is important to note that the validation set is not balanced with SMOTE, so most of the images are actually of "nv". We can also see some other correct predictions.
def preprocess_image(image_path, target_size=(224, 224)):
img = load_img(image_path, target_size=target_size)
img_array = img_to_array(img) / 255.0
return np.expand_dims(img_array, axis=0)
sample_images = df_test.sample(n=20)
correctly_classified = []
misclassified = []
for index, row in sample_images.iterrows():
img_path = row['image_path']
actual_label = row['dx']
img_array = preprocess_image(img_path)
predicted_label_idx = np.argmax(cnn_model.predict(img_array))
predicted_label = label_encoder.inverse_transform([predicted_label_idx])[0]
if predicted_label == actual_label:
correctly_classified.append((img_path, actual_label, predicted_label))
else:
misclassified.append((img_path, actual_label, predicted_label))
def plot_images(image_list, title, num_images=5):
plt.figure(figsize=(12, 6))
plt.suptitle(title, fontsize=16)
for i, (img_path, actual, predicted) in enumerate(image_list[:num_images]):
plt.subplot(1, num_images, i + 1)
img = load_img(img_path, target_size=(224, 224))
plt.imshow(img)
plt.axis("off")
plt.title(f"True: {actual}\nPred: {predicted}")
plt.show()
plot_images(correctly_classified, "Correctly Classified Images")
plot_images(misclassified, "Misclassified Images")
1/1 [==============================] - 0s 31ms/step 1/1 [==============================] - 0s 26ms/step 1/1 [==============================] - 0s 33ms/step 1/1 [==============================] - 0s 35ms/step 1/1 [==============================] - 0s 30ms/step 1/1 [==============================] - 0s 25ms/step 1/1 [==============================] - 0s 29ms/step 1/1 [==============================] - 0s 29ms/step 1/1 [==============================] - 0s 30ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 30ms/step 1/1 [==============================] - 0s 43ms/step 1/1 [==============================] - 0s 46ms/step 1/1 [==============================] - 0s 48ms/step 1/1 [==============================] - 0s 35ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 37ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 23ms/step
sample_images = df_test.sample(n=20)
correctly_classified = []
misclassified = []
for index, row in sample_images.iterrows():
img_path = row['image_path']
actual_label = row['dx']
img_array = preprocess_image(img_path)
predicted_label_idx = np.argmax(resnet_model.predict(img_array))
predicted_label = label_encoder.inverse_transform([predicted_label_idx])[0]
if predicted_label == actual_label:
correctly_classified.append((img_path, actual_label, predicted_label))
else:
misclassified.append((img_path, actual_label, predicted_label))
plot_images(correctly_classified, "Correctly Classified Images")
plot_images(misclassified, "Misclassified Images")
1/1 [==============================] - 1s 900ms/step 1/1 [==============================] - 0s 66ms/step 1/1 [==============================] - 0s 71ms/step 1/1 [==============================] - 0s 89ms/step 1/1 [==============================] - 0s 125ms/step 1/1 [==============================] - 0s 113ms/step 1/1 [==============================] - 0s 64ms/step 1/1 [==============================] - 0s 73ms/step 1/1 [==============================] - 0s 79ms/step 1/1 [==============================] - 0s 95ms/step 1/1 [==============================] - 0s 113ms/step 1/1 [==============================] - 0s 87ms/step 1/1 [==============================] - 0s 87ms/step 1/1 [==============================] - 0s 134ms/step 1/1 [==============================] - 0s 64ms/step 1/1 [==============================] - 0s 65ms/step 1/1 [==============================] - 0s 66ms/step 1/1 [==============================] - 0s 70ms/step 1/1 [==============================] - 0s 67ms/step 1/1 [==============================] - 0s 86ms/step
The custom CNN achieves an accuracy of 69%, showing that it is able to distinguish between multiple classes, though it still struggles with some. This model correctly predicts a variety of classes, including akiec, bcc, bkl, mel, and nv, but it struggles with df (dermatofibroma), which it completely fails to classify. The "nv" class has the highest recall (0.80), meaning it identifies melanocytic nevi correctly in most cases, as we already mentioned.
ResNet50V2 achieves 67% accuracy, similar to the custom CNN but slightly worse in terms of balance across classes. While it performs better than EfficientNetB0 and the small dataset ResNet50, it still has low recall for rarer classes like akiec and df. However, it does classify multiple classes relatively well, with nv still dominating the correct predictions. The improved recall for bcc (0.64), bkl (0.47), and mel (0.47) indicates that ResNet50V2 captures more meaningful features than EfficientNetB0 but still struggles with class imbalance and limited real samples.
🤝 Conclusion¶
Throughout this notebook, we explored the complex task of classifying skin lesions using deep learning models.
One of the biggest takeaways from this process was the difficulty of training models on a dataset with severe class imbalance. While SMOTE was used to address this issue, synthetic data alone was not enough to create perfectly balanced classes, and some models still showed a strong bias toward the most frequent class ("nv").
Additionally, we learned that transfer learning with pre-trained models like EfficientNetB0 and ResNet50V2 provides a strong starting point but does not always guarantee optimal performance without sufficient real training data. EfficientNetB0, in particular, failed to generalize across all lesion types, highlighting that not all pre-trained models work equally well for specialized medical imaging tasks. ResNet50V2 and our custom CNN showed the best balance between generalization and class diversity.
Another major factor in this notebook was computational constraints. The models—especially the deeper ones—took a significant amount of time to train, limiting how much fine-tuning and hyperparameter optimization could be performed within the available time frame. This made training multiple models and testing different strategies a challenge, as each iteration required long processing times.
From an experimental standpoint, we tested various approaches to improve performance, including:
- Different CNN architectures (EfficientNetB0, ResNet50V2, a custom CNN, and ResNet50 trained on a smaller dataset).
- Handling imbalanced data with SMOTE, which helped but did not completely solve class imbalance issues.
- Hyperparameter tuning, which allowed us to compare different optimization techniques and dropout strategies.
- Data augmentation, which improved generalization but was not enough to overcome the dataset’s inherent class distribution challenges.
Ultimately, we achieved reasonable results with ResNet50V2 and the custom CNN, demonstrating that carefully fine-tuned architectures can achieve decent accuracy even in difficult multi-class classification problems. However, there is still room for improvement, especially in handling class imbalance and exploring different architectures better suited for medical imaging.
Some improvement steps are:
- Experimenting with more advanced balancing techniques.
- Trying other pre-trained networks.
- Using additional real-world datasets to improve generalization.
In conclusion, this project has provided valuable hands-on experience with deep learning for medical imaging, reinforcing the importance of data quality, balance, and model selection when working with complex multi-class problems. While the results were promising, they also highlighted the challenges of skin lesion classification and the need for future improvements to enhance performance further.
Grad-CAM¶
Grad-CAM is a visual explanation technique that helps us understand where a convolutional neural network (CNN) is "looking" when making a prediction. In image classification, especially in medical contexts like skin lesion diagnosis, it is important to not only predict correctly but also to ensure the model focuses on the correct parts of the image. This interpretability adds transparency and trust to model decisions.
In this section, we will:
- Apply Grad-CAM to the trained Custom CNN and ResNet50V2 (as they had the best performance and are most relevant in this notebook based on the Evaluation)
- Use test images from multiple classes
- Visualize both correctly and incorrectly classified images
- Provide heatmaps showing the models' attention
- Include a side-by-side comparison of one correct and one incorrect use to better see the difference
- Explain the findings
First, we implement the full pipeline for generating and displaying Grad-CAM visualizations. We start by preprocessing the images so they are resized and normalized. Then, using the generate_gradcam function, we create class activation heatmaps that show which areas of the image influenced the model’s decision the most. The overlay_heatmap function helps us visually combine these heatmaps with the original images, making it easier to interpret where the model is “looking.” To analyze the model’s behavior, we use the get_predictions function to randomly select a few examples from the test set, both correctly and incorrectly classified.
def preprocess_image(img_path, size=(224, 224)):
img = load_img(img_path, target_size=size)
img_array = img_to_array(img) / 255.0
return np.expand_dims(img_array, axis=0), img
def generate_gradcam(model, img_array, layer_name, class_idx):
grad_model = Model([model.inputs], [model.get_layer(layer_name).output, model.output])
with tf.GradientTape() as tape:
conv_outputs, predictions = grad_model(img_array)
loss = predictions[:, class_idx]
grads = tape.gradient(loss, conv_outputs)[0]
pooled_grads = tf.reduce_mean(grads, axis=(0, 1))
conv_outputs = conv_outputs[0].numpy()
for i in range(pooled_grads.shape[0]):
conv_outputs[:, :, i] *= pooled_grads[i].numpy()
heatmap = np.mean(conv_outputs, axis=-1)
heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
return heatmap
def overlay_heatmap(img, heatmap):
heatmap = cv2.resize(heatmap, (img.size[0], img.size[1]))
heatmap = np.uint8(255 * heatmap)
heatmap_color = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
superimposed_img = cv2.addWeighted(np.array(img), 0.6, heatmap_color, 0.4, 0)
return superimposed_img
def get_predictions(model, df_test, label_encoder, num_each=5):
correct, incorrect = [], []
for idx, row in df_test.sample(frac=1).iterrows():
img_path = row['image_path']
true_label = row['dx']
img_array, _ = preprocess_image(img_path)
pred_idx = np.argmax(model.predict(img_array, verbose=0)[0])
pred_label = label_encoder.inverse_transform([pred_idx])[0]
if pred_label == true_label and len(correct) < num_each:
correct.append((img_path, true_label, pred_label))
elif pred_label != true_label and len(incorrect) < num_each:
incorrect.append((img_path, true_label, pred_label))
if len(correct) >= num_each and len(incorrect) >= num_each:
break
return correct, incorrect
def visualize_gradcam_samples(model, layer_name, samples, label_encoder, title):
plt.figure(figsize=(15, 6))
for i, (img_path, true_label, pred_label) in enumerate(samples):
img_array, orig_img = preprocess_image(img_path)
class_idx = label_encoder.transform([pred_label])[0]
heatmap = generate_gradcam(model, img_array, layer_name, class_idx)
gradcam_img = overlay_heatmap(orig_img, heatmap)
plt.subplot(2, 5, i + 1)
plt.imshow(gradcam_img)
plt.title(f"True: {true_label}\nPred: {pred_label}")
plt.axis("off")
plt.suptitle(title)
plt.tight_layout()
plt.show()
correct_cnn, incorrect_cnn = get_predictions(cnn_model, df_test, label_encoder)
cnn_last_conv_layer = "conv2d_9"
visualize_gradcam_samples(cnn_model, cnn_last_conv_layer, correct_cnn, label_encoder, "Custom CNN - Correctly Classified")
visualize_gradcam_samples(cnn_model, cnn_last_conv_layer, incorrect_cnn, label_encoder, "Custom CNN - Misclassified")
The Grad-CAM visualizations shown above offer valuable insight into how the Custom CNN model makes decisions when classifying skin lesions. In the top row, we observe five correctly classified examples, where the model accurately predicted the lesion type. The heatmaps overlayed on the original images suggest that the model was generally focusing on the central region of the lesion—where most of the distinguishing visual features are concentrated. For instance, in the correctly predicted "mel" (melanoma) and "nv" (nevus) cases, the red/yellow heatmap areas highlight the lesion's core, indicating the model’s attention was directed toward meaningful regions that likely contributed to its correct prediction.
In contrast, the bottom row illustrates five misclassified cases. In some examples, the model focused on surrounding areas or less informative regions of the lesion. Additionally, it is important to recognize that these misclassifications are not solely due to attention errors—some classes are inherently more difficult for the model to distinguish (as discussed in the Evaluation chapter).
But, overall, these visualizations demonstrate that the Custom CNN performs reasonably well in many cases by focusing on diagnostically useful regions, but also reveal that misclassifications often result from attention being diverted to irrelevant areas.
correct_resnet, incorrect_resnet = get_predictions(resnet50v2_model, df_test, label_encoder)
resnet_last_conv_layer = "conv5_block3_out"
visualize_gradcam_samples(resnet50v2_model, resnet_last_conv_layer, correct_resnet, label_encoder, "ResNet50V2 - Correctly Classified")
visualize_gradcam_samples(resnet50v2_model, resnet_last_conv_layer, incorrect_resnet, label_encoder, "ResNet50V2 - Misclassified")
Now, we apply the same function to the ResNet50V2 model. The top row displays correctly classified samples, where the heatmaps highlight the lesion areas.
In the bottom row, we see five misclassified examples. The Grad-CAM heatmaps in these cases often show scattered or unfocused attention. In some images, the model seems to concentrate on the background or outer edges rather than the lesion itself.
Next, we create a new function, where we will compare one correctly classified image and one missclassified. This way, we can see better the comparison. We randomly sample from the test set until we find one example where the model’s prediction matches the true label, and another where it does not. Both images are then processed and passed through the model to generate class activation heatmaps. These heatmaps are again overlaid on the original images to show where the model focused when making its predictions.
def show_gradcam_pair(model, df_test, label_encoder, layer_name):
correct_sample = None
incorrect_sample = None
for _, row in df_test.sample(frac=1).iterrows():
img_path = row["image_path"]
true_label = row["dx"]
img_array, orig_img = preprocess_image(img_path)
pred_idx = np.argmax(model.predict(img_array, verbose=0)[0])
pred_label = label_encoder.inverse_transform([pred_idx])[0]
if pred_label == true_label and correct_sample is None:
correct_sample = (img_path, true_label, pred_label)
elif pred_label != true_label and incorrect_sample is None:
incorrect_sample = (img_path, true_label, pred_label)
if correct_sample and incorrect_sample:
break
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
samples = [correct_sample, incorrect_sample]
titles = ["Correctly Classified", "Misclassified"]
for i, (img_path, true_label, pred_label) in enumerate(samples):
img_array, orig_img = preprocess_image(img_path)
class_idx = label_encoder.transform([pred_label])[0]
heatmap = generate_gradcam(model, img_array, layer_name, class_idx)
gradcam_img = overlay_heatmap(orig_img, heatmap)
axes[i].imshow(gradcam_img)
axes[i].set_title(f"{titles[i]}\nTrue: {true_label}, Pred: {pred_label}")
axes[i].axis("off")
plt.tight_layout()
plt.show()
show_gradcam_pair(
model=cnn_model,
df_test=df_test,
label_encoder=label_encoder,
layer_name="conv2d_9"
)
As we can see up close, we have 1 correctly classified image and one not. This is a result of the Custom CNN model. On the left, we see an image of an "nv" lesion, which the model predicted correctly. The heatmap is well-focused on the lesion area, suggesting that the model attended to the right region when making its decision. On the right, however, we see a "mel" case that was incorrectly classified as nv. Although the model still focused on the lesion, the heatmap reveals a broader or less precise attention pattern.
show_gradcam_pair(
model=resnet50v2_model,
df_test=df_test,
label_encoder=label_encoder,
layer_name="conv5_block3_out"
)
We did the same for the ResNet50V2 model. On the left, the model successfully identifies an "nv", with the heatmap clearly focused on the center of the lesion.
On the right, however, we observe a misclassification. The true class is "akiec", but the model predicted "mel". Although the heatmap shows that the model did focus on the lesion area, it may have been influenced by certain visual features resembling melanoma. This demonstrates that even when the attention is placed in a seemingly meaningful area, the extracted features can still mislead the model. Also, as mentioned, some classes have worse performance because of the many synthetic data generated for them and the lack of much real data in the training set.
Conclusion of Grad-CAM Analysis¶
As a conclusion to the Grad-CAM analysis, we can reflect on both the strengths and limitations of the models used in this project. Grad-CAM provided valuable insight into how the neural networks make decisions by highlighting the areas in the image that influence predictions the most. In several correctly classified examples, the heatmaps clearly showed focused attention on the lesion itself, confirming that the model was attending to meaningful regions when making its predictions. However, misclassified cases often revealed either dispersed or misplaced attention, where the model focused on surrounding skin or unrelated textures, contributing to incorrect outcomes.
An important factor influencing these results is the quality and balance of the training data. Some classes uch as "nv" were better represented in the dataset with many real images, which helped the model learn more reliable features. In contrast, other classes had limited real data and relied heavily on synthetic samples generated by SMOTE. While this technique helped mitigate class imbalance numerically, it could not fully replicate the variability and subtle patterns present in real-world medical images. This likely contributed to poorer performance and less precise attention in underrepresented classes.
Overall, Grad-CAM not only helped validate the correctness of some predictions but also exposed weaknesses in model generalization and dataset representation.
📚 References¶
- The Skin Cancer Foundation. (2024). Skin Cancer Facts & Statistics. Retrieved from https://www.skincancer.org/skin-cancer-information/skin-cancer-facts/?utm_source=chatgpt.com.
- Goyal, M., Knackstedt, T., Yan, S., & Hassanpour, S. (2019). Artificial Intelligence-Based Image Classification for Diagnosis of Skin Cancer: Challenges and Opportunities. Retrieved from https://arxiv.org/abs/1911.11872?utm_source=chatgpt.com.